A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection
نویسندگان
چکیده
Entity linking maps name mentions in context to entries in a knowledge base through resolving the name variations and ambiguities. In this paper, we propose two advancements for entity linking. First, a Wikipedia-LDA method is proposed to model the contexts as the probability distributions over Wikipedia categories, which allows the context similarity being measured in a semantic space instead of literal term space used by other studies for the disambiguation. Furthermore, to automate the training instance annotation without compromising the accuracy, an instance selection strategy is proposed to select an informative, representative and diverse subset from an auto-generated dataset. During the iterative selection process, the batch sizes at each iteration change according to the variance of classifier’s confidence or accuracy between batches in sequence, which not only makes the selection insensitive to the initial batch size, but also leads to a better performance. The above two advancements give significant improvements to entity linking individually. Collectively they lead the highest performance on KBP-10 task. Being a generic approach, the batch size changing method can also benefit active learning for other tasks.
منابع مشابه
I2R-NUS-MSRA at TAC 2011: Entity Linking
In this paper, we report the joint participation of I2R-NUS team and MSRA team in entity linking task for Knowledge Base Population at Text Analysis Conference 2011. I2R-NUS team submitted two results with the full system and the partial system for diagnosis purpose. Both results incorporate the new technologies: acronym expansion, instance selection and topic modeling proposed in our recent pa...
متن کاملNLPComp in TAC 2012 Entity Linking and Slot-Filling
The NLPComp team participated in two TACKBP2012 tasks: Regular Entity Linking and Regular Slot Filling. For the entity linking task, a three-step entity linking system is developed. In the first step, a list of possible candidates are selected. Then the best candidate is identified to decide whether a link exists. In addition, a document clustering algorithm is used to group NIL queries. This s...
متن کاملEstimating the Parameters for Linking Unstandardized References with the Matrix Comparator
This paper discusses recent research on methods for estimating configuration parameters for the Matrix Comparator used for linking unstandardized or heterogeneously standardized references. The matrix comparator computes the aggregate similarity between the tokens (words) in a pair of references. The two most critical parameters for the matrix comparator for obtaining the best linking results a...
متن کاملClassifying Articles in Chinese Wikipedia with Fine-Grained Named Entity Types
Named entity classification of Wikipedia articles is a fundamental research area that can be used to automatically build large-scale corpora of named entity recognition or to support other entity processing, such as entity linking, as auxiliary tasks. This paper describes a method of classifying named entities in Chinese Wikipedia with fine-grained types. We considered multi-faceted information...
متن کاملEntity Linking with people entity on Wikipedia
This paper introduces a new model that uses named entity recognition, coreference resolution, and entity linking techniques, to approach the task of linking people entities on Wikipedia people pages to their corresponding Wikipedia pages if applicable. Our task is different from general and traditional entity linking because we are working in a limited domain, namely, people entities, and we ar...
متن کامل